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Abstract 

With  fast  development  of  medical  information  systems  and  software,  clinical  decision 
support  (CDS)  systems  continue  to  develop  new  methods  to  deal  with  diverse  infor¬ 
mation  coming  from  heterogeneous  sources  such  as  a  large  volume  of  electronic  med¬ 
ical  records  (EMRs),  patient  genomic  data,  existing  genomic  pharmaceutical  data¬ 
bases,  curated  disease-specific  databases,  peer-reviewed  research,  etc.  As  an  avenue 
towards  advanced  clinical  decision-making,  TREC  CDS  track  focuses  on  developing 
new  techniques  to  find  medical  cases  that  are  useful  for  patient  care  from  biomedical 
literature.  Meanwhile,  given  the  volume  of  the  existing  literature,  and  the  diversity  in 
biomedical  field,  finding  &  delivering  relevant  medical  cases  for  a  particular  clinical 
need  is  a  non-trivial  task.  Moreover,  understanding  three  kinds  of  different  topics  (i.e. 
diagnosis,  test,  and  treatment)  and  retrieving  appropriate  biomedical  research  articles 
are  quite  challenging.  To  address  these  problems,  we  propose  concept-based  docu¬ 
ment  re-ranking  approaches  to  clinical  documents.  We  basically  use  pseudo  relevance 
feedback  for  query  expansion  to  retrieve  initial  relevant  documents.  In  addition,  we 
considered  two  different  concept-based  re-ranking  approaches  which  utilize  popular 
external  biomedical  knowledge  resources  (i.e.  Wikipedia  and  UMLS)  for  improving 
biomedical  information  retrieval.  Our  concept-based  re -ranking  approaches  are  to 
bridge  the  gaps  between  queries  and  biomedical  research  articles  in  semantic  level. 


1  Introduction 

TREC  Clinical  Decision  Support  Track  (CDS)  aims  to  investigate  techniques  for 
linking  medical  cases  to  information  that  are  relevant  for  patient  care  from  published 
biomedical  literature.  The  published  biomedical  literature  which  can  be  searched 
through  PubMed  is  a  trustable,  comprehensive  source  for  exploratory  analysis  and 
clinical  decision-making  support  because  it  maintains  a  number  of  biomedical  re¬ 
search  articles  including  various  information  such  as  patient  demographics,  laboratory 
test  results,  radiology  reports,  clinical  demonstration,  medicine  treatment,  etc.  The 
task  of  CDS  is  to  find  biomedical  research  articles  published  in  PubMed  Central 
(PMC)  with  a  given  query  which  requires  expertise  to  make  a  decision  for  treating  a 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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patient.  It  provides  a  PMC  collection  with  733,138  XML  articles  and  30  test  queries 
classified  into  one  of  three  classes:  diagnosis,  test,  and  treatment. 

In  our  participation  to  CDS,  we  propose  concept-based  document  re -ranking  ap¬ 
proaches  to  retrieval  biomedical  documents.  First,  a  set  of  documents  are  obtained 
from  an  initial  search.  Then,  a  first  stage  of  re-ranking  is  performed  using  pseudo 
relevance  feedback  by  expanding  a  query.  At  the  second  stage,  we  devised  concept- 
based  re-ranking  approaches  that  utilize  two  different  external  biomedical  knowledge 
resources  (i.e.  Wikipedia  and  UMLS)  for  more  accurate  biomedical  information  re¬ 
trieval.  Our  concept-based  re-ranking  approaches  are  to  show  the  potentials  of  using 
external  knowledge  resources  in  aspects  of  understanding  the  input  queries  and  the 
retrieved  biomedical  research  articles  in  semantic  level  based  on  the  concepts  of 
knowledge  resources. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  explains  our  proposed  ap¬ 
proaches  in  details.  Section  3  presents  experimental  results  among  different  avenues 
towards  effective  CDS.  In  Section  4,  we  summarize  our  entire  work  and  introduce 
future  search  direction. 


2  Method 

Our  method  is  to  re-rank  documents  obtained  from  an  initial  search  with  two  stages. 
In  the  first  stage,  pseudo  relevance  feedback  (PRF)  is  applied  to  obtain  accurate  rank¬ 
ing  by  expanding  a  query  based  on  initial  search  results.  In  the  second  stage,  concept- 
based  ranking  with  two  different  medical  resources  are  performed.  Next  subsection 
describes  our  method  in  detail. 


2.1  Pseudo  Relevance  Feedback 


For  a  given  query  Q ,  a  set  of  documents,  Dinit  —  { D, ,  D2, ... ,  Dk),  are  retrieved  from  a 
document  collection  COL  using  a  search  engine.  Lucene1  is  employed  with  query- 
likelihood  method  using  Dirichlet  smoothing.  Then,  ranking  is  performed  on  Dinit. 

In  this  stage,  KL-divergence  method  is  used  to  compute  a  similarity  score  between 
a  query  and  a  document  [10,13]: 


score(Q,D )  =  exp  KL(^6q\\6d^ 

=  exp  (-^p{w\eQ)log 


P(w\9q )  \ 
p(w\6D)  ) 


(1) 


where  9q  and  6 D  are  query  and  document  language  models,  respectively. 

In  general,  a  query  model  is  estimated  by  maximum  likelihood  estimate  (MLE)  as 
below: 


1  http://lucene.apache.org/ 
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p(w\9Q)  = 


where  c(w,  Q)  is  a  count  of  a  word  w  in  a  query  Q  and  |Q|  is  the  number  of  words 
in  Q. 

To  avoid  zero  probabilities  and  improve  retrieval  performance,  a  document  model 
is  estimated  using  Dirichlet  smoothing  [15]: 


p(w|0D) 


c(w,D )  +  p  ■  p^wlCOL) 

+  p 


(3) 


where  c(w,D )  is  a  count  of  a  word  w  in  a  document  D ,  p(w|COL)  is  a  probability 
of  a  word  w  in  a  collection  COL,  and  p  is  the  Dirichlet  prior  parameter. 

PRF  is  a  popular  way  of  expanding  a  query.  It  is  assumed  that  top-ranked  docu¬ 
ments  F  —  {D1,£>2,...,D|F|}  in  initial  search  results  relevant  to  a  given  query  and 
terms  in  F  are  useful  to  modify  a  query  for  a  better  representation.  Relevance  model 
(RM)  is  to  estimate  a  multinomial  distribution  p(w\Q)  that  is  the  likelihood  of  a  term 
w  given  a  query  Q.  The  first  version  of  relevance  model  (RM1)  is  defined  as  follows: 


PrmiMQ)  =  Y  v(w\0d)  '  p(Od\Q) 

DEF 

V  r  m  s  P(Q\0d)  -p(0D) 

=Xv(-wIW — m — 

DEF 

oc  Y  P(w|0D)  ■  p(0d)  ■  p(Q|0d) 


(4) 


RM1  is  composed  with  three  components:  document  prior  p(0D),  document  weight 
p(Q  |0d)-  and  term  weight  in  a  document  p(w|0o).  In  general,  p(0o)  is  assumed  to  be 
a  uniform  distribution  without  the  knowledge  of  a  document  D.  p(<2|0o)  = 
riweQ  p(w|0o)c(w'<3)indicates  the  query-likelihood  score.  p(w|0o)  can  be  estimated 
using  various  smoothing  methods  such  as  Dirichlet-smoothing.  Various  strategies  are 
applicable  to  estimate  these  components. 

To  improve  retrieval  performance,  a  new  query  model  can  be  estimated  by  comb¬ 
ing  a  relevance  model  and  an  original  query  model.  RM3  [1]  is  a  variant  of  a  rele¬ 
vance  model  to  estimate  a  new  query  models  with  RM1: 

p{w\B'Q)  =  (1  -  P)  ■  p(w|0Q)  +  /?  ■  PrmiMQ)  (5) 

where  (3  is  a  control  parameter  between  the  original  query  model  and  the  feedback 
model. 


2.2  Concept-based  Ranking 


The  key  idea  of  concept-based  IR  is  to  find  documents  by  representing  them  with 
concepts  rather  than  words.  It  is  a  popular  solution  to  deal  with  synonymy  and  poly¬ 
semy  problems  occur  in  IR  tasks  based  on  bag-of-words  representation  [6].  In  gen¬ 
eral,  Wikipedia  is  utilized  as  a  resource  of  concepts  because  it  has  millions  of  con¬ 
cepts  in  the  world  while  UMLS  having  medical-specific  concepts  which  include 
SNOMED-CT  and  MeSH  is  dominantly  used  in  biomedical  IR  tasks.  We  utilized  two 
resources  in  different  ways  for  concept-based  IR. 


Concept  mapping  with  Wikipedia.  Wikipedia  is  utilized  as  a  concept  resource.  We 
assumed  that  a  subset  of  concepts  relevant  to  medical  domain  in  Wikipedia  are  useful 
to  CDS.  To  retain  useful  medical  concepts,  those  belonging  to  International  Classifi¬ 
cation  Diseases  (ICD)-IO  2  are  selected  for  concept  mapping.  ICD-10  is  a  hierarchical 
classification  scheme  of  diseases  and  other  health  problems  defined  by  World  Health 
Organization  (WHO).  Thus,  coverage  and  granularity  of  concepts  in  ICD-10  are  as¬ 
sumed  to  be  suitable  to  CDS.  Unfortunately,  all  concepts  of  ICD-10  don’t  exist  in 
Wikipedia.  From  more  than  14,400  ICD-10  concepts,  7,162  concepts  are  retained 
since  they  have  an  article  in  Wikipedia.  Fig.  1  shows  a  Wikipedia  article  for  ICD-10 
concept  Cholera. 

Cholera 

From  Wikipeda.  the  free  encyclopedia 

For  the  dish,  see  Cholera  (food). 

Cholera  is  an  infection  of  the  small  intestine  caused  by  the 
bacterium  Vibrio  choterae. 

The  main  symptoms  are  watery  diarrhea  and  vomiting.  This 
may  result  in  dehydration  and  in  severe  cases  grayish-bluish 
skin. 111  Transmission  occurs  primarily  by  drinking  water  or 
eating  food  that  has  been  contaminated  by  the  feces  (waste 
product)  of  an  infected  person,  including  one  with  no  apparent 
symptoms. 

The  severity  of  the  diarrhea  and  vomiting  can  lead  to  rapid 
dehydration  and  electrolyte  Imbalance,  and  death  In  some 
cases.  The  primary  treatment  is  oral  rehydration  therapy, 
typically  with  oral  rehydration  solution,  to  replace  water  and 
electrolytes.  If  this  is  not  tolerated  or  does  not  provide 
improvement  fast  enough,  intravenous  fluids  can  also  be  used 
Antibacterial  drugs  are  beneficial  in  those  with  severe  disease 
to  shorten  its  duration  and  severity. 

Worldwide,  it  effects  3-5  million  people  and  causes  100,000- 
130,000  deaths  a  year  as  of  2010.  Cholera  was  one  of  the 
earliest  infections  to  be  studied  by  epidemiological  methods. 

Categories:  Cholera  Biological  weapons  Diarrhea  Foodborne  illnesses  i  Gl  tract  disorders 

Intestinal  infectious  diseases  Neglected  diseases  Pandemics  Waterborne  diseases 

Fig.  1.  An  example  Wikipedia  article  of  ICD-10  concept  Cholera' 


Cholera 

Classification  and  external  resources 


Scanning  electron  microscope  image  ot  Vibrio 
choterae 


ICO-IO 

AOOiff 

ICD-9 

001  Iff 

Disease  sOB 

29089  iff 

MedlinePlus 

000303  iff 

eMedicine 

med/351  iff 

Patient  UK 

Cholera  iff 

MeSH 

D 002771  iff 

1  http://apps.who.int/classifications/icdl0/browse/2010/en 
3  http://en.wikipedia.org/wiki/Cholera 


Based  on  the  selected  concepts,  ranking  is  performed  by  scoring  documents  with 
concept  mapping  method  introduced  in  [7].  The  method  is  adaption  of  concept  map¬ 
ping  to  document  clustering  with  Wikipedia.  In  our  case,  we  do  ranking  than  cluster¬ 
ing  with  documents.  A  document  is  represented  by  a  word  vector.  Words  are  stemmed 
and  lower-cased  after  stop-words  are  removed  using  a  stop-words  list4.  The  words  in 
the  word  vector  are  mapped  to  ICD-10  concepts.  In  addition,  a  category  vector  can  be 
derived  from  a  concept  vector  by  similar  mapping  because  an  article  corresponding  to 
a  concept  has  a  set  of  categories  at  the  end  of  an  article  as  shown  in  Fig.  1.  This  is  a 
decomposition  of  a  document-category  matrix  into  three  components,  document- 
word,  word-concept,  and  concept-category  matrices,  shown  in  Fig.  2.  Entries  are 
filled  with  standard  TF-IDF  values  in  document-word  matrix  while  they  are  filled 
with  modified  versions  of  TF-IDF  values  for  concepts  and  categories  in  others. 


»1 

_  E9 

x  E9 

v 

Cl 

02 

m 

us 

A 

C2 

£>3 

C3 

Fig.  2.  Decomposition  of  document-category  matrix 


Fig.  3.  Final  score  computation  by  combining  three  different  scores 
Therefore,  as  shown  in  Fig.  3,  we  can  compute  three  scores  based  on  different  rep¬ 
resentations  of  a  document  and  a  query  using  cosine  similarity  function. 

A  final  score  is  computed  by  a  linear  combination  of  three  scores: 

score(Q,D )  =  a±  •  simword(Q ,  D)  +  a2  •  simconcept(Q ,  D)  +  a3 

Si7Tlcaf:egory  (Q  ,  D)  (6) 

where  0  <  a1,a2,a3  <  1  and  a1;  +a2  +  a3  =  1 

1 

In  this  paper,  we  set  them  uniformly  as  at  —  a2  —  a3  — 

Concept  mapping  with  UMLS.  Unified  Medical  Language  System  (UMLS)  [4]  is 
utilized  as  a  domain-specific  concept  resource.  It  contains  about  900,000  biomedical 
concepts  integrating  various  resources  such  as  the  NCBI  taxonomy.  Gene  Ontology, 


4  http://mallet.cs.umass.edu 


Medical  Subject  Headings  (MeSH),  OMIM  and  the  Digital  Anatomist  Symbolic 
Knowledge  Base.  In  addition,  it  also  contains  the  mapping  between  about  900,000 
concepts  and  over  2  million.  Due  to  the  large  volume  of  UMLS,  it  is  often  utilized  for 
concept-based  IR  in  biomedical  domain  [8,12].  We  employ  MetaMap  [2]  to  identify 
UMLS  concepts  from  texts.  One  characteristic  of  using  MetaMap  is  that  we  can  take 
into  consideration  the  negation  of  concepts.  Dealing  with  negations  is  an  important 
issue  in  other  research  [3,9,1 1,14].  We  handle  the  negations  of  concepts  in  documents 
with  respect  to  those  in  a  query  while  negated  concepts  are  penalized  in  most  of  the 
previous  work. 

Consider  a  case  with  a  query  Q  and  two  candidate  documents  Di  and  D2 .  Let’s 
suppose  that  a  concept  C  is  contained  in  Q  and  the  two  documents  D1  and  D2  having 
the  same  number  of  occurrences  of  C.  If  C  in  D1  is  mostly  negated  while  it  is  mostly 
affirmed  in  D2,  it  is  natural  to  say  that  the  document  D2  should  be  ranked  higher  than 
the  document  D1  in  search  results.  On  the  other  hand,  if  C  appears  in  a  negated  form 
in  Q ,  D1  should  be  ranked  higher  than  D2.  Based  on  such  idea,  we  propose  a  strategy 
for  handling  the  negations  of  concepts  as  follows: 

1.  If  a  concept  negated  in  a  query  appears  in  a  document  with  affirmation,  decrease 
the  score  of  the  document  with  respect  to  the  query. 

2.  If  a  concept  negated  in  a  query  appears  in  a  document  with  negations,  increase  the 
score  of  the  document  with  respect  to  the  query 

3.  Take  into  account  the  number  of  times  where  a  term  is  affirmed  or  negated  in  a 
document. 

Negations  are  identified  using  NegEx  [5]  which  is  embedded  in  MetaMap. 

In  order  to  highlight  the  effect  of  the  proposed  strategy,  we  selected  six  UMLS  se¬ 
mantic  types  that  are  often  negated  in  the  test  queries  and  used  only  the  UMLS  con¬ 
cepts  belonging  to  those  types  for  document  re-ranking.  Table  1  shows  the  selected 
semantic  types.  We  can  see  that  the  selected  semantic  types  do  not  include  those  re¬ 
lated  to  qualification  such  as  qualitative  concepts,  spatial  concepts,  body  location  or 
region.  Although  MetaMap  often  proposes  groups  of  concepts  for  a  given  phrase5,  the 
characteristics  of  the  selected  semantic  types  allow  us  to  focus  on  individual  concepts 
rather  than  groups  of  concepts. 


Table  1.  Selected  UMLS  Semantic  Tyi 

pe 

Semantic  Type 

Abbreviation 

ID 

Disease  or  Syndrome 

dsyn 

T047 

Finding 

fndg 

T033 

Sign  or  Symptom 

sosy 

T184 

Pathologic  Function 

patf 

T046 

5  For  instance,  for  the  phrase  “mild  dyspnea”,  MetaMap  proposes  a  concept  group 
that  consists  of  two  concepts;  1)  concept  ‘mild’  of  ‘Qualitative  Concept’  semantic 
type  and  2)  concept  ‘dyspnea’  of ‘Sign  or  Symptom’  sematic  type. 


Injury  or  Poisoning 

inpo 

T037 

Anatomical  Abnormality 

anab 

T190 

Given  a  document  D ,  a  concept  vector  CVD  =  {v1,v2,  . . . ,  vn }  is  constructed  where 
vi  =Hp£DTi,j=i(.NegijpConfijp')/k.  Here,  p  represents  a  phrase  that  conveys  a 
biomedical  concept,  and  k  represents  the  number  of  candidates  (Meta  Mappings)  pro¬ 
posed  for  p.  Confij  p  is  the  confidence  score  for  the  /th  concept  in  /'th  candidate  for  p. 
We  merge  all  the  candidates  into  a  single  normalized  version  rather  than  selecting  the 
most  probable  one  among  the  candidates.  We  assumed  that  MetaMap  would  produce 
the  same  candidates  when  it  is  given  the  same  phrase  in  similar  contexts.  Neg^j  p  is  a 
term  to  handle  the  negations  as  proposed  above.  The  value  is  -1  if  the  /th  concept  in 
/'th  candidate  for  p  is  identified  as  negated,  and  1  if  affirmed.  Concept  vector  for  Q  is 
constructed  in  the  same  way.  Then,  cosine  similarity  between  two  concept  vectors  is 
computed.  A  final  score  is  a  combination  of  scores  from  PRF  and  the  cosine  similari¬ 
ty: 


score(Q,  D )  =  scorePRF(Q,  D)  +  a  ■  sim(Q,  D )  (7) 

where  a  >  0  is  a  weight  of  the  similarity  from  concept  mapping 


3  Results 

Table  2  shows  the  descriptions  of  the  submitted  runs  for  evaluation.  Runl  is  ob¬ 
tained  using  language  model  with  Dirichlet  smoothing  implemented  Lucene.  Then, 
re -ranking  with  PRF  is  performed  on  the  initial  search  results  in  Run2.  Run3,  Run4, 
and  Run5  are  the  results  of  concept-mapping  with  Wikipedia  and  UMLS.  In  all  runs, 
1,000  documents  for  each  query  are  retrieved  and  re -ranked.  For  PRF  with  RM,  the 
numbers  of  feedback  documents  and  words  are  set  to  10  and  100,  respectively.  Mix¬ 
ture  weights  for  Dirichlet  smoothing  (p)  and  RM  (/?),  are  set  to  0.1  and  1,500,  re¬ 
spectively. 


Table  2.  Descriptions  of  submitted  runs  for  evaluation 


ID 

Description 

Runl 

Initial  search 

Run2 

Initial  search  +  pseudo  relevance  feedback 

Run3 

Initial  search  +  pseudo  relevance  feedback  +  Wikipedia 

Run4 

Initial  search  +  pseudo  relevance  feedback  +  MetaMap  (a  =  1) 

Run5 

Initial  search  +  pseudo  relevance  feedback  +  MetaMap  (a  =  2) 

Table  3  shows  the  performance  summary  for  five  runs.  We  can  see  that  performances 
of  Run2,  Run4,  and  Run5  are  improved  against  Runl  while  it  is  degraded  in  Run3. 
We  think  that  the  degradation  in  Run3  comes  from  improper  concept  mapping  to 
ICD-10  in  Wikipedia.  Our  restriction  of  ICD-10  may  result  in  insufficient  coverage  of 


concepts  (about  7,000  concepts).  From  Run4  and  Run5,  concept-mapping  to  UMLS 
improves  performance.  However,  they  are  not  high  as  we  expected.  We  think  that 
these  little  improvements  show  the  limitation  of  re-ranking  on  the  initial  search.  Ac¬ 
cording  to  our  investigation  of  the  initial  search,  the  number  of  relevant  documents  is 
relatively  low  in  the  initial  search  results  by  comparing  the  all  documents  judged  as 
relevant.  Thus,  it  is  necessary  to  perform  re -ranking  after  initial  search  with  query 
expansion  to  contain  many  relevant  documents. 


Table  3.  Summary  of  evaluation  results 


Runl 

Run2 

Run3 

Run4 

Run5 

map 

0.1054 

0.1085 

0.0933 

0.1086 

0.1086 

R-prec 

0.1665 

0.1667 

0.1443 

0.1666 

0.1659 

P10 

0.2933 

0.2933 

0.2200 

0.2933 

0.2933 

infAP 

0.0462 

0.0491 

0.0424 

0.0492 

0.0492 

infNDCG 

0.1911 

0.193 

0.1759 

0.1946 

0.1938 

4  Conclusion 

For  TREC  Clinical  Decision  Support  track,  we  proposed  two  different  concept- 
based  re-ranking  approaches  which  utilize  Wikipedia  and  UMLS  as  a  concept  re¬ 
source.  We  observed  small  performance  improvements  from  the  concept-based  re¬ 
ranking  by  using  UMLS  (i.e.,  MetaMap).  However,  in  order  to  achieve  higher  per¬ 
formances,  a  number  of  issues  remained  unresolved  should  be  tackled  further.  As  our 
future  work,  we  plan  to  develop  more  effective  way  to  utilizing  biomedical 
knowledge  resources  and  sophisticated  negation  handing  strategy  towards  advanced 
concept-based  ranking. 
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